11 research outputs found
U-Time: A Fully Convolutional Network for Time Series Segmentation Applied to Sleep Staging
Neural networks are becoming more and more popular for the analysis of
physiological time-series. The most successful deep learning systems in this
domain combine convolutional and recurrent layers to extract useful features to
model temporal relations. Unfortunately, these recurrent models are difficult
to tune and optimize. In our experience, they often require task-specific
modifications, which makes them challenging to use for non-experts. We propose
U-Time, a fully feed-forward deep learning approach to physiological time
series segmentation developed for the analysis of sleep data. U-Time is a
temporal fully convolutional network based on the U-Net architecture that was
originally proposed for image segmentation. U-Time maps sequential inputs of
arbitrary length to sequences of class labels on a freely chosen temporal
scale. This is done by implicitly classifying every individual time-point of
the input signal and aggregating these classifications over fixed intervals to
form the final predictions. We evaluated U-Time for sleep stage classification
on a large collection of sleep electroencephalography (EEG) datasets. In all
cases, we found that U-Time reaches or outperforms current state-of-the-art
deep learning models while being much more robust in the training process and
without requiring architecture or hyperparameter adaptation across tasks.Comment: To appear in Advances in Neural Information Processing Systems
(NeurIPS), 201
The International Workshop on Osteoarthritis Imaging Knee MRI Segmentation Challenge: A Multi-Institute Evaluation and Analysis Framework on a Standardized Dataset
Purpose: To organize a knee MRI segmentation challenge for characterizing the
semantic and clinical efficacy of automatic segmentation methods relevant for
monitoring osteoarthritis progression.
Methods: A dataset partition consisting of 3D knee MRI from 88 subjects at
two timepoints with ground-truth articular (femoral, tibial, patellar)
cartilage and meniscus segmentations was standardized. Challenge submissions
and a majority-vote ensemble were evaluated using Dice score, average symmetric
surface distance, volumetric overlap error, and coefficient of variation on a
hold-out test set. Similarities in network segmentations were evaluated using
pairwise Dice correlations. Articular cartilage thickness was computed per-scan
and longitudinally. Correlation between thickness error and segmentation
metrics was measured using Pearson's coefficient. Two empirical upper bounds
for ensemble performance were computed using combinations of model outputs that
consolidated true positives and true negatives.
Results: Six teams (T1-T6) submitted entries for the challenge. No
significant differences were observed across all segmentation metrics for all
tissues (p=1.0) among the four top-performing networks (T2, T3, T4, T6). Dice
correlations between network pairs were high (>0.85). Per-scan thickness errors
were negligible among T1-T4 (p=0.99) and longitudinal changes showed minimal
bias (<0.03mm). Low correlations (<0.41) were observed between segmentation
metrics and thickness error. The majority-vote ensemble was comparable to top
performing networks (p=1.0). Empirical upper bound performances were similar
for both combinations (p=1.0).
Conclusion: Diverse networks learned to segment the knee similarly where high
segmentation accuracy did not correlate to cartilage thickness accuracy. Voting
ensembles did not outperform individual networks but may help regularize
individual models.Comment: Submitted to Radiology: Artificial Intelligence; Fixed typo
The Medical Segmentation Decathlon
International challenges have become the de facto standard for comparative
assessment of image analysis algorithms given a specific task. Segmentation is
so far the most widely investigated medical image processing task, but the
various segmentation challenges have typically been organized in isolation,
such that algorithm development was driven by the need to tackle a single
specific clinical problem. We hypothesized that a method capable of performing
well on multiple tasks will generalize well to a previously unseen task and
potentially outperform a custom-designed solution. To investigate the
hypothesis, we organized the Medical Segmentation Decathlon (MSD) - a
biomedical image analysis challenge, in which algorithms compete in a multitude
of both tasks and modalities. The underlying data set was designed to explore
the axis of difficulties typically encountered when dealing with medical
images, such as small data sets, unbalanced labels, multi-site data and small
objects. The MSD challenge confirmed that algorithms with a consistent good
performance on a set of tasks preserved their good average performance on a
different set of previously unseen tasks. Moreover, by monitoring the MSD
winner for two years, we found that this algorithm continued generalizing well
to a wide range of other clinical problems, further confirming our hypothesis.
Three main conclusions can be drawn from this study: (1) state-of-the-art image
segmentation algorithms are mature, accurate, and generalize well when
retrained on unseen tasks; (2) consistent algorithmic performance across
multiple tasks is a strong surrogate of algorithmic generalizability; (3) the
training of accurate AI segmentation models is now commoditized to non AI
experts
The Liver Tumor Segmentation Benchmark (LiTS)
In this work, we report the set-up and results of the Liver Tumor
Segmentation Benchmark (LITS) organized in conjunction with the IEEE
International Symposium on Biomedical Imaging (ISBI) 2016 and International
Conference On Medical Image Computing Computer Assisted Intervention (MICCAI)
2017. Twenty four valid state-of-the-art liver and liver tumor segmentation
algorithms were applied to a set of 131 computed tomography (CT) volumes with
different types of tumor contrast levels (hyper-/hypo-intense), abnormalities
in tissues (metastasectomie) size and varying amount of lesions. The submitted
algorithms have been tested on 70 undisclosed volumes. The dataset is created
in collaboration with seven hospitals and research institutions and manually
reviewed by independent three radiologists. We found that not a single
algorithm performed best for liver and tumors. The best liver segmentation
algorithm achieved a Dice score of 0.96(MICCAI) whereas for tumor segmentation
the best algorithm evaluated at 0.67(ISBI) and 0.70(MICCAI). The LITS image
data and manual annotations continue to be publicly available through an online
evaluation system as an ongoing benchmarking resource.Comment: conferenc
Cross-Cohort Automatic Knee MRI Segmentation With Multi-Planar U-Nets
Background: Segmentation of medical image volumes is a time-consuming manual task. Automatic tools are often tailored toward specific patient cohorts, and it is unclear how they behave in other clinical settings. Purpose: To evaluate the performance of the open-source Multi-Planar U-Net (MPUnet), the validated Knee Imaging Quantification (KIQ) framework, and a state-of-the-art two-dimensional (2D) U-Net architecture on three clinical cohorts without extensive adaptation of the algorithms. Study Type: Retrospective cohort study. Subjects: A total of 253 subjects (146 females, 107 males, ages 57 ± 12 years) from three knee osteoarthritis (OA) studies (Center for Clinical and Basic Research [CCBR], Osteoarthritis Initiative [OAI], and Prevention of OA in Overweight Females [PROOF]) with varying demographics and OA severity (64/37/24/53/2 scans of Kellgren and Lawrence [KL] grades 0–4). Field Strength/Sequence: 0.18 T, 1.0 T/1.5 T, and 3 T sagittal three-dimensional fast-spin echo T1w and dual-echo steady-state sequences. Assessment: All models were fit without tuning to knee magnetic resonance imaging (MRI) scans with manual segmentations from three clinical cohorts. All models were evaluated across KL grades. Statistical Tests: Segmentation performance differences as measured by Dice coefficients were tested with paired, two-sided Wilcoxon signed-rank statistics with significance threshold α = 0.05. Results: The MPUnet performed superior or equal to KIQ and 2D U-Net on all compartments across three cohorts. Mean Dice overlap was significantly higher for MPUnet compared to KIQ and U-Net on CCBR ((Formula presented.) vs. (Formula presented.) and (Formula presented.)), significantly higher than KIQ and U-Net OAI ((Formula presented.) vs. (Formula presented.) and (Formula presented.), and not significantly different from KIQ while significantly higher than 2D U-Net on PROOF ((Formula presented.) vs. (Formula presented.), (Formula presented.), and (Formula presented.). The MPUnet performed significantly better on (Formula presented.) KL grade 3 CCBR scans with (Formula presented.) vs. (Formula presented.) for KIQ and (Formula presented.) for 2D U-Net. Data Conclusion: The MPUnet matched or exceeded the performance of state-of-the-art knee MRI segmentation models across cohorts of variable sequences and patient demographics. The MPUnet required no manual tuning making it both accurate and easy-to-use. Level of Evidence: 3. Technical Efficacy: Stage 2